Back

European Journal of Epidemiology

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match European Journal of Epidemiology's content profile, based on 40 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Educational Inequalities in Well-Being in Later Life in Germany: The Role of Health Behaviours and Health Literacy

Franzese, F.; Bergmann, M.; Burzynska, A.

2026-04-24 epidemiology 10.64898/2026.04.22.26351388 medRxiv
Top 0.1%
18.1%
Show abstract

Socioeconomic inequalities in health and well-being are a major public health concern, particularly in ageing populations. Education is a key determinant shaping multiple aspects of health outcomes. We used cross-sectional data from wave 9 of the German sample (n=4,148) of the Survey of Health, Ageing and Retirement in Europe (SHARE) to test whether formal education is associated with well-being in later adulthood, with health literacy, self-rated health, and preventive health behaviours as possible mediators. Our results showed that education was positively associated with greater well-being, but only via indirect pathways. Specifically, self-rated health, health literacy, and fruit and vegetable consumption mediated the relationship between education and well-being accounting for 54.7, 24.7, and 12.6 percent of the total effect, respectively. In addition, there were significant positive correlations between education and health literacy, as well as high-intensity physical activity, daily fruit and vegetable consumption, more preventive health check-ups, and less smoking. In contrast, alcohol consumption was more common among those with higher levels of education. All health behaviours and health literacy were correlated directly or indirectly (i.e., mediated by health) with well-being. These findings highlight the importance of examining indirect pathways linking education to well-being in later life. Interventions aimed at improving health literacy and promoting healthy behaviours may help reduce educational inequalities in quality of life among older adults. About the SHARE Working Paper SeriesThe SHARE Working Paper Series started in 2011 and collects pre-publication versions of papers or book chapters, technical and methodological reports as well as policy papers based on SHARE data. The working papers are not reviewed by the publisher (SHARE-ERIC), layout and editing are not standardized. The publisher takes no responsibility for the scientific content of the paper. Working Papers can be updated - a version number is indicated on the front page. Previous versions are available upon request.

2
Long-term within-person variation of routinely measured biomarkers are associated with mortality and cardiovascular health

Webster, A. J.; Drakesmith, C. W.; Perera-Salazar, R.; Steinsaltz, D.; COMPUTE team,

2026-05-05 epidemiology 10.64898/2026.05.04.26352236 medRxiv
Top 0.1%
14.5%
Show abstract

Biomarker measurements can assist with disease diagnosis and the assessment of disease risks, with the most recent measurements usually used by disease-risk models. However, a growing number of studies suggest that in addition to a biomarkers value, its inherent variability, estimated from several measurements over many days or years in an individual, can convey independent prognostic information about disease risks. Variance estimates require an individuals biomarker data to have been measured a sufficient number of times, ideally across a long time period, and are usually only available in a hospital setting or clinical trial. Furthermore, a single biomarker measurement will involve a combination of measurement-error, natural short-term variation over a daily time-period, variation over time periods of weeks and months, and slower age-dependent changes over several years. This paper develops a statistical method that accounts for these latter concerns, and applies it to Clinical Practice Research Datalink (CPRD) data collected by UK General Practitioners. It studies the associations between cardiovascular health outcomes and the within-person variances of eight routinely measured biomarkers. This involved Sequential Monte Carlo modeling to convert an individuals biomarker measurements (collected over months or years), into estimates for the biomarkers mean, linear age-dependent slope, within-person variance, and a variance due to variation on a daily time period or measurement errors. The result is a proof-of-principle that UK primary care Electronic Health Records (from CPRD) can be effectively used for this purpose. After adjusting for mean biomarker values, clear associations were found between mortality or cardiovascular disease risks and within-person variances for 6 of 8 biomarkers.

3
Life Course Socioeconomic Position and health in older adulthood age: A Formal Mediation Analysis in the 1958 British Birth Cohort

Guo, Y.; Pelikh, A.; Ploubidis, G. B.; Goodman, A.

2026-03-25 epidemiology 10.64898/2026.03.23.26349085 medRxiv
Top 0.1%
9.8%
Show abstract

Background Childhood socioeconomic position (SEP) is a key determinant of later life health. Understanding the extent to which adult SEP mediates this association into early old age is important for explaining how health inequalities are propagated across generations and how they might be addressed in later life. To our knowledge, no prospective study has examined whether childhood SEP remains associated with health at the threshold of older age and the extent to which any such association is mediated by adult SEP. Methods We used data from the 1958 British Birth Cohort, a prospective study that has followed participants since birth, drawing on earlier data collected at birth and ages 33 and 55 years and newly collected data from the age 62 sweep. Using interventional causal mediation analyses, we assessed whether adult occupational class, education, housing tenure, and income mediate associations between childhood social class (manual vs non manual) and health at age 62 (self rated health, C reactive protein [CRP], cholesterol ratio, Glycated hemoglobin [HbA1c], and N terminal pro B type natriuretic peptide [NT proBNP]). Findings Associations between childhood SEP and self rated health, CRP, cholesterol ratio, and HbA1c persisted after accounting for adult SEP. Mediation was outcome specific and differed by sex. Among men, occupational class mediated 39% of the association with self rated health (indirect effect RR 0.90, 95% CI 0.86,0.95) and education mediated 27% (0.93, 0.90,0.96). Among women, education mediated 10% (0.95, 0.91,0.98) and housing tenure mediated 6% (0.97, 0.94,0.99). Indirect effects for CRP were smaller, and mediation was minimal for cholesterol ratio, HbA1c, and NT proBNP Interpretation Population level improvements in adult SEP could reduce, but are unlikely to eliminate, later life health inequalities associated with childhood SEP. Reducing these inequalities will require policies that address disadvantage in early life and improve adult financial and employment conditions. Funding UK Economic and Social Research Council

4
Mapping the Dynamic Interplay of Mental Health and Weight Across Childhood: Data-Driven Explorations Using Causal Discovery

Larsen, T. E.; Lorca, M. H.; Ekstrom, C. T.; Vinding, R.; Bonnelykke, K.; Strandberg-Larsen, K.; Petersen, A. H.

2026-04-17 epidemiology 10.64898/2026.04.16.26350943 medRxiv
Top 0.1%
7.2%
Show abstract

Childhood weight development, especially overweight and obesity, has been associated with mental health, but their dynamic, causal relationships, and whether these differ by sex, remain unclear. We applied causal discovery to data from the Danish National Birth Cohort (n=67,593) spanning six periods from pregnancy to late adolescence and considering 67 variables related to child and parental weight, mental health, lifestyle, and socio-economic factors. We found no statistically significant difference between the causal graphs for boys and girls (P=0.079). The data-driven models found causal influence of childhood weight on subsequent weight status. Mental health pathways were exclusively within or across adjacent periods and centered on early adolescent stress. We examined the interplay between a subset of mental health variables, containing information on externalizing and internalizing problems, and weight, and found no direct causal pathway between the two processes. These findings suggest that observed links between weight and these mental health measures may be attributable to confounding. Our findings demonstrate the value of data-driven causal discovery in large cohort studies and how to test for differences in causal mechanisms across subgroups. Results are available in an interactive application, enabling future research to further explore the interplay between weight and mental health.

5
Frailty progression following severe infections in adults aged 65 years and above in US and England: two matched cohort studies

Asare, K.; Mansfield, K. E.; Gore-Langton, G. R.; Cadogan, S. L.; Barry, E.; Keogh, R.; Lo Re, V.; Rodriguez-Barradas, M. C.; Justice, A. C.; Rentsch, C. T.; Warren-Gash, C.

2026-03-15 epidemiology 10.64898/2026.03.13.26348319 medRxiv
Top 0.1%
6.9%
Show abstract

BackgroundWe investigated frailty progression after severe infections in adults ([≥]65 years) in the US and England. MethodsWe conducted parallel matched cohort studies using: US Veterans Aging Cohort Study (VACS-National, 2008-2019; median age 74 years; 98% male); and English Clinical Practice Research Datalink (2006-2019; median age 76 years; 45% male). Adults hospitalised primarily for infection (i.e., severe infection) were matched in calendar date order to individuals without severe infection on age, sex, care site, and US only, plus race and ethnicity. We measured frailty using VACS Index 2{middle dot}0 (US) and Electronic Frailty Index (eFI; England). We estimated annual conditional mean frailty differences between adults with versus without severe infection using linear regression adjusting for baseline frailty, demographics, lifestyle factors, infection history, and US only, comorbidities. ResultsMean baseline frailty was higher in those with severe infection than those without (US: 57 v 48; England: 0{middle dot}17 v 0{middle dot}12). At Year 1, adjusted mean frailty was higher among adults with severe infections than those without (US: VACS Index +2{middle dot}0, 95% CI 1{middle dot}9-2{middle dot}0; England: eFI +0{middle dot}005, 95% CI 0{middle dot}005-0{middle dot}006). At Years 2-5, adjusted mean frailty remained higher after severe infection; however, compared to Year 1, differences were smaller in US, and larger in England. Effects varied by infection type (strongest for lower respiratory tract infections, meningoencephalitis (UK only), urinary tract infections, and sepsis). InterpretationIndividuals with severe infections had higher frailty at baseline and follow up than those without. Preventing both frailty and infections is important for improving health in older age. FundingWellcome Research in contextO_ST_ABSEvidence before this studyC_ST_ABSWe searched PubMed (inception to October 27, 2025), for published articles evaluating the association between infections and frailty, with no language restrictions. We used the search terms [(infection OR infectious) AND (frailty OR frail)]. We found fifteen observational studies investigating associations between individual infections (including: HIV, cytomegalovirus, SARS-CoV-2, acute respiratory infection, urinary tract infection, and influenza) and frailty in adults. Frailty measures varied: eight studies used Frieds phenotype index, six used versions of the cumulative deficit index (i.e., Edmonton Frail Scale, FRAIL-NH Scale, Hospital Frailty Risk Score, Clinical Frailty Score, Veterans Affairs Frailty Index, Vulnerable Elders Survey-13), and one study used the Timed Up and Go Test. Results from identified studies were mixed, with nearly half (7/15) reporting a positive association between the infection studied and frailty, and the remaining eight finding no evidence of association. In cross-sectional analyses, HIV, SARS-CoV-2, cytomegalovirus, and urinary tract infection, were each associated with higher mean frailty scores or frailty prevalence. In longitudinal analysis, hospitalisation for acute respiratory infection was followed by higher mean hospital frailty risk scores two years post-discharge. SARS-CoV-2 infection was associated with early onset (i.e., higher hazard) of frailty over three years follow-up. However, other studies found no association between HIV, SARS-CoV-2, acute respiratory infection and influenza, and frailty prevalence, incidence, or transition between frailty states. These mixed findings may reflect methodological differences between the studies, including variation in frailty measures, and study limitations. Frailty exists along a continuum of vulnerability, and progression after infection may be an important outcome, yet current evidence is scarce. It remains unclear whether severe infections or different types of infection, are associated with faster frailty deterioration. Similarly, it is uncertain whether post-infection frailty risk varies by pathogen (bacterial, viral, parasitic, fungal), infection type (sepsis, urinary tract infection, skin and soft tissue infection, meningitis/encephalitis, lower respiratory tract, gastroenteritis), or by age, sex, social deprivation, and pre-existing comorbidities. Added value of this studyOur study compared frailty progression over a five-year period between adults aged [≥]65 years with severe infection (hospitalisation primarily due to infection) versus comparators without severe infection. We found higher baseline frailty at severe infection onset than in matched comparators. We saw evidence of increased frailty progression over time in people following severe infections compared to those without, however, these differences were small. We also saw higher risk of worsening frailty progression in older adults and those with dementia. Further, worsening frailty progression varied by infection type (strongest for lower respiratory tract infections, meningoencephalitis (UK only), urinary tract infections, and sepsis). Implications of all the available evidenceOur findings underscore the importance of both frailty and infection prevention in improving health in older age. Additional studies are required to explore other wider life-course influences on frailty, to guide the development of comprehensive preventive strategies.

6
Simulation-Based Comparison of ControlledInterrupted Time Series (CITS) and Multivariable Regression

ORWA, F. O.; Mutai, C.; Nizeyimana, I.; Mwangi, A.

2026-04-13 health policy 10.64898/2026.04.10.26350670 medRxiv
Top 0.1%
6.4%
Show abstract

When randomized controlled trials are impractical, interrupted time series designs offer a rigorous quasi-experimental approach to assess population level policies. Indeed, in the context of quasi-experimental designs (QEDs), the Interrupted Time Series (ITS) method is commonly thought of as the most robust. But interrupted time series designs are susceptible to serial correlation and confounding by time-varying factors associated with both the intervention and the outcome, which may result in biased inference. Thus, we provide a simulation-based contrast of controlled interrupted time series (CITS) and multivariable regression (multivariable negative binomial regression) for estimation of policy effects in count time series data. These approaches are widely used in policy evaluations, yet their comparative performance in typical population health settings has rarely been examined directly. We tested both approaches within a variety of data generating situations, differing in the series length, intervention effect size, and magnitude of lag-1 autocorrelation. Bias, standard error calibration, confidence interval coverage, mean squared error, and statistical power were assessed for performance. Both methods gave unbiased estimates for moderate and large intervention effects, although bias was more pronounced for small effects, particularly in short series. Although the point estimate performance was similar, inferential properties varied significantly. CITS always had smaller mean squared error, better consistency between model based and empirical standard errors, and confidence interval coverage near the 95% nominal levels over weak to moderate autocorrelation. By contrast, multivariable regression was more sensitive to serial dependence, leading to underestimated standard errors and undercoverage, especially at moderate to high autocorrelation, regardless of Newey-West adjustments. These findings show the benefits of using a concurrent control series and the importance of structurally accounting for serial correlation when studying population level policies with time series data.

7
Causal analyses using education-health linked data for England: a case study

De Stavola, B. L. L.; Aparicio Castro, a.; Nguyen, V. G.; Lewis, K. M.; Dearden, L.; Harron, K.; Zylbersztejn, A.; Shumway, J.; Gilbert, R.

2026-03-19 health policy 10.64898/2026.03.13.26348340 medRxiv
Top 0.1%
4.7%
Show abstract

IntroductionThis article summarises lessons learnt from the Health Outcomes for young People throughout Education (HOPE) Study and serves as a real world, transferable application for addressing causal questions using administrative data. The HOPE study applied causal methods to analyses of administrative data in Education and Child Health Insights from Linked Data (ECHILD) aimed at studying the effectiveness of provision for special educational needs and disability (SEND) on health and education outcomes. MethodsDefining causal questions regarding the impact of SEND provision required judicious mapping of the question onto the data, leading to the selection of appropriate measures of effect, transparent handling of the data and control of confounding factors to estimate effects. We adopted the target trial emulation framework to guide these steps. Having encountered specific computational challenges in estimating the effects of interest, we simulated data that resembled the HOPE study and used them to practice the implementation of alternative estimation methods and to study impact of some of their assumptions. ResultsThe creation and analysis of the simulated data provided valuable insights. First, we learned the importance of aligning the target of estimation with the causal question at hand. Second, we observed how deviations from assumptions specific to each estimation method can affect results. Third, we highlighted the benefits of employing alternative estimation methods as sensitivity tools that can aid the interpretation of the resulting estimates. Finally, we offer user-friendly code in two programming languages (R and Stata) and accompanying simulated data to facilitate the implementation of these methods for similar causal questions. ConclusionWe recommend users of administrative data to fully specify -and possibly revise- the causal questions they wish to address and to carefully examine and compare assumptions, implementation and results obtained using alternative estimation methods.

8
The control gap in long COVID research: a meta-epidemiological analysis

Panagiotopoulos, A.-P.; Laskaris, A.; Tsakri, D.; Manoussopoulos, Y.; Anastassopoulou, C.; Tsakris, A.; Ioannidis, J.

2026-05-21 epidemiology 10.64898/2026.05.16.26353381 medRxiv
Top 0.1%
4.3%
Show abstract

Objectives To quantify the frequency of baseline control-group use in published long COVID prevalence studies and assess their key methodological features. Design Cross-sectional meta-epidemiological evaluation of published post-acute COVID-19 prevalence studies, supplemented by a corresponding-author survey. Setting Published studies identified through a systematic review by Hou et al. (2025) and supplementary data obtained through direct email contact with corresponding authors. Participants A total of 440 published long COVID prevalence studies. Main Outcome measures Presence and type of comparator group, reliance on solely self-reported outcomes, acknowledgment of lack of a control group among uncontrolled studies, and availability of additional comparator data through author survey. Results Among 440 studies, 372 (84.5%) reported no control group on their publication. Healthy or uninfected comparators were reported in 55 studies (12.5%) and other comparator types in 14 (3.2%); 1 study included both categories. Solely self-reported outcomes were used in 279 studies (63.4%). Among 372 uncontrolled studies, 244 (65.6%) did not explicitly acknowledge the absence of a baseline comparator as a limitation anywhere in text. Corresponding authors of 140 studies (31.8%) responded to the survey; among them, 126 (90.0%) reported no additional comparative data, while 14 (10.0%) mentioned some available comparative datasets (19 additional datasets). Almost all of that information (10/14, 17/19) had been already published in other articles not captured by the Hou et al. systematic review. Conclusions Most published long COVID prevalence studies lacked comparator groups and relied exclusively on self-reported outcomes without acknowledging this limitation. Direct author contact identified little additional comparator information. Much of the long COVID prevalence literature may therefore be poorly suited to estimating burden attributable specifically to SARS-CoV-2, underscoring the need for appropriately matched comparators and more objective outcome assessment. Registration The protocol was prospectively registered on the Open Science Framework (https://osf.io/f4hra).

9
Validation of an AI-Assisted Framework for Systematic Bias Assessment in Observational Studies

Etminan, M.; Rezaeianzadeh, R.; Douros, A.

2026-04-28 epidemiology 10.64898/2026.04.26.26351778 medRxiv
Top 0.1%
4.1%
Show abstract

BackgroundThe rapid expansion of medical literature has led to substantial variability and frequent contradictions in study findings, making it increasingly difficult to distinguish meaningful signals from noise. Much of this variability arises from differences in study methodology, where biases such as confounding, selection bias, and reverse causation can drive spurious associations. While artificial intelligence (AI)-assisted tools have been developed to support risk-of-bias assessment, most are designed for systematic reviews and are not tailored to identifying specific epidemiologic biases in observational studies. This highlights the need for structured, scalable approaches to evaluate study validity in real-world evidence. ObjectiveTo develop and validate an AI-assisted, expert-informed, rule-based framework (EpiVise) for systematically identifying and classifying key sources of bias in pharmacoepidemiologic studies, and to assess its agreement with expert evaluation. MethodsWe conducted a validation study using recently published pharmacoepidemiologic studies from high-impact journals (post-2025). Each study was independently assessed by the framework and two expert epidemiologists, across predefined bias domains, including measured confounding, confounding by indication, selection bias, immortal time bias, and disease latency. Agreement was evaluated using weighted kappa statistics. In the absence of a gold standard, expert judgment served as the reference benchmark. In a second phase, synthetic study scenarios with predefined embedded biases were constructed to assess the frameworks ability to detect known bias structures under controlled conditions. ResultsIn analyses of published studies (10 studies; 60 ratings), agreement between the framework and expert assessments was substantial ({kappa} = 0.75; 95% confidence interval [CI], 0.60-0.86), with 12 discordant ratings (20.0%), all limited to adjacent categories and occurring primarily in the confounding by indication and selection bias domains. In synthetic study scenarios (10 studies; 50 ratings), agreement was similarly substantial, with 42 of 50 ratings concordant (84%) and a weighted kappa of 0.77 (95% CI, 0.67-0.87); discordances included both adjacent-category and extreme disagreements and were concentrated in confounding by indication, selection bias, and prevalent user bias domains. ConclusionsThis AI-assisted, expert-informed framework, EpiVise provides a scalable and reproducible approach for evaluating epidemiologic study validity, substantial demonstrating agreement comparable to expert assessment. By systematically identifying key sources of bias, the framework has the potential to enhance the rigor and consistency of evidence evaluation, support peer review, and inform clinical, regulatory, and policy decision-making. Further validation across broader study designs and domains is warranted.

10
Towards reproducible multimorbidity clustering in electronic health records: a transparent pipeline for aligning research aims and methodology

Romero Moreno, G.; Restocchi, V.; De Ferrari, L.; Palmer, J.; Fleuriot, J. D.; Guthrie, B.; Lone, N. I.

2026-05-26 health informatics 10.64898/2026.05.25.26353178 medRxiv
Top 0.1%
3.8%
Show abstract

The availability of electronic health records has facilitated data-driven approaches to the understanding of multimorbidity, with clustering becoming a common tool for uncovering relevant groups of associated conditions. Previous studies, however, have found challenges in their reproducibility, with wide disparity in the reported clusters. At the core of this issue lays a vagueness of the definition of a cluster, leading to a lack of standards in their methods and evaluation, while implementation details are often not completely reported or explicit in their assumptions. We present a methodological pipeline that can be adapted to different cluster definitions (e.g. multiple cluster membership or clusters where all nodes are mutually associated) and a set of scores that can be composed into an evaluation metric that explicitly incorporates assumptions that align with the research aims. We apply our pipeline to a healthcare dataset of over 7 million patients in England and show how clusters may drastically differ when varying the parameter choices, exposing the risks of reporting a single clustering realisation. Our methodological pipeline, evaluation framework, and tools for analysis and network visualisation serve as a reference to transparently explore and align methodological decisions to the aims of multimorbidity clustering, contributing to overcome the reproducibility challenges of the field.

11
Keeping human in the loop: A three-phase generative AI workflow for research integrity in data-intensive science.A methodological case study using elite Ethiopian distance-running data

Galko, P.; Yisamaw, A.; Haugen, T.; Seiler, S.

2026-05-29 sports medicine 10.64898/2026.05.29.26354013 medRxiv
Top 0.1%
3.7%
Show abstract

Background: Generative AI tools can support data-intensive research by writing code, drafting prose, searching analytical possibilities, and stress-testing claims. They can also produce false citations, drift between statistical specifications, and lose continuity across long investigations. This paper describes a practical workflow for using AI systems in empirical research while keeping discovery, verification, and accountability inspectable. Methods: We developed and applied a three-phase human-AI workflow to a case study of 14 elite Ethiopian distance runners. The dataset contained 22,605 GPS-segments collected across 97 consecutive days in late 2025, supplemented by venue and athlete metadata collected in the field. Phase 1 used an autonomous data-exploration tool to pre-filter the hypothesis space across five seeded research questions. Phase 2 used an AI system under direct human guidance to construct candidate findings into numerical claims, verification scripts, and draft text. Phase 3 used an independent AI system in an adversarial role to stress-test methods, statistics, prose, figures, and citations. The workflow was informed by Pearl's distinction between association, intervention, and counterfactual reasoning, with human judgement retained for research direction, interpretation, and final claims. Results: The workflow produced three empirical analyses and a documented correction process. The analyses estimated an altitude-to-sea-level pace correction of +0.10 min/km per 1,000 m at matched heart rate, showed why pooled altitude-surface regression was not identifiable within this venue system, documented method-dependence in heart-rate-based intensity classification, characterised within-venue route variation as a 64/36 path-fixed-to-trail-variable split with the Sululta label resolving into two functionally distinct sub-venues, and reframed the cohort's training through a 3x3x3 prescription lattice grounded in Ethiopian coaching practice. The adversarial phase identified several hallucinated citations, a terminology error between HC1 and cluster-robust standard errors, and several inconsistencies between prose, figures, and computed results. Verification scripts re-derived nearly all numerical claims from the cleaned lap-level data. Conclusions: The case study shows how researchers can organise AI-assisted empirical work so that candidate discovery, claim construction, independent stress-testing, and final accountability remain separated. The workflow did not remove the need for domain expertise or human judgement. Its value was in making the route from candidate finding to manuscript claim explicit, reproducible, and open to challenge. Trial registration: Not applicable.

12
Adherence to the Eatwell Guide and associations with markers of physical function: A prospective analysis within the UK Biobank cohort

Griffiths, A.; Gregory, S.; Malcomson, F. C.; Cronin, K.; Matu, J.; Ells, L.; Shannon, O. M.

2026-04-28 epidemiology 10.64898/2026.04.27.26351814 medRxiv
Top 0.1%
3.7%
Show abstract

BackgroundThe Eatwell Guide represents the UKs principal healthy eating model and understanding whether adherence to UK dietary recommendations can attenuate age-related functional decline is essential to inform healthy ageing strategies. MethodsIn up to 157,457 participants from the UK Biobank, we explored cross-sectional and prospective associations between adherence to the Eatwell Guide and markers of physical function (grip strength, fat-free mass percentage, self-reported walking pace, and falls). Eatwell Guide adherence scores were derived from 24-hour dietary recall data (Oxford WebQ), and quantified using a graded, food-based scoring system. Differences between population subgroups including by age, sex, physical activity, and protein intake level were explored. ResultsHigher Eatwell Guide adherence was cross-sectionally associated with higher grip strength, greater fat-free mass percentage, higher odds of brisk walking pace, and lower odds of falls (all p<0.001). Prospectively, greater adherence was associated with attenuated fat-free mass decline ({beta}=0.02, SE=0.001, p<0.001) and slower grip strength decline ({beta}=0.01, SE=0.002, p<0.01). Higher adherence was also prospectively associated with greater odds of brisk walking pace (OR=1.02, 95% CI: 1.017-1.021, p<0.01), though this advantage attenuated over follow-up (EWG*Time: OR=0.998, 95% CI: 0.997-0.999, p=0.002). Higher adherence was prospectively associated with lower falls risk (OR=0.996, 95% CI: 0.995-0.998, p<0.001), with this protective association remaining stable over time (EWG*Time: p=0.89). ConclusionsHigher Eatwell Guide adherence was associated with preserved muscle mass, modest attenuation of grip strength decline over time, and a reduced risk of falls, supporting its relevance for musculoskeletal health and physical function in ageing populations.

13
The Robust Bidirectional Association Between Chronic Lung Disease and Incident Osteoporosis: A Two-Stage Individual Participant Data Meta-Analysis of Three International Longitudinal Cohorts (HRS, SHARE, and ELSA)

Jiang, D.; Bao, J.

2026-03-19 respiratory medicine 10.64898/2026.03.18.26348689 medRxiv
Top 0.1%
3.7%
Show abstract

Abstract Background: The association between chronic lung disease (CLD) and osteoporosis (OP) is well-recognized, but the direction and magnitude of this relationship remain debated, particularly in aging populations. We aimed to quantify the bidirectional association between CLD (including COPD and asthma) and incident OP using a two-stage individual participant data (IPD) meta-analysis of three large longitudinal cohorts. Methods: We harmonized and analyzed individual-level data from the Health and Retirement Study (HRS, USA), the Survey of Health, Ageing and Retirement in Europe (SHARE, Europe), and the English Longitudinal Study of Ageing (ELSA, UK), all comprising adults aged greater than or equal to[&ge;]50 years. In the first stage, Cox proportional hazards models were fitted separately in each cohort to estimate hazard ratios (HRs) for the forward (CLD[-&gt;]OP) and reverse (OP[-&gt;]CLD) associations, adjusting for a comprehensive set of confounders (demographics, lifestyle, comorbidities, functional status). In the second stage, cohort-specific log HRs were pooled using fixed-effect meta-analysis. Heterogeneity was assessed with the I-squared statistic. Results: A total of 40,050 participants were included across the three cohorts. The pooled HR for incident OP among individuals with baseline CLD was 1.37 (95% confidence interval [CI] 1.24-1.51), with similar estimates for COPD (HR 1.47, 95% CI 1.27-1.69) and asthma (HR 1.35, 95% CI 1.22-1.50). For the reverse association, baseline OP was associated with increased risk of incident CLD (pooled HR 1.16, 95% CI 1.05-1.29), COPD (HR 1.28, 95% CI 1.11-1.47), and asthma (HR 1.17, 95% CI 1.05-1.30). Heterogeneity was low across all analyses (I2[&le;]7.5%). Conclusion: This two-stage IPD meta-analysis provides robust evidence of a bidirectional relationship between CLD and OP in older adults. These findings underscore the need for integrated screening and management of both conditions in aging populations.

14
Can Dietary Fibre Intake Reduce the Risk of Mental and Behavioral Disorders Due to Use of Tobacco in Smokers?

Qi, X.; Qi, H.; li, N.; Wang, T.; Wang, W.; Song, X.; Mi, B.; Zhang, D.

2026-03-28 addiction medicine 10.64898/2026.03.26.26349460 medRxiv
Top 0.1%
3.6%
Show abstract

ABSTRACT Background and aims: Mental and behavioral disorders due to use of tobacco (MBDT) present a critical challenge to global health, yet modifiable lifestyle factors for reducing its risk remain poorly understood. Given that dietary fibre can affect mental health through gut-brain communication, we sought to explore how fibre intake relates to MBDT risks in smokers. Methods: We specifically evaluated the link between dietary fibre intake and MBDT within a smoking population. Utilizing the UK Biobank (UKB) database, we performed cross-sectional (N=19,943) and prospective cohort (N=19,885) evaluations applying logistic and Cox proportional hazards models, respectively. To determine potential causality, two-sample Mendelian randomization (MR) was applied, relying on GWAS summary data derived from the IEU Open GWAS Project and FinnGen repositories. Results: Cross-sectional findings indicated that individuals in the top quartile (Q4) of fibre intake exhibited decreased MBDT risks relative to the bottom quartile (Q1) (OR: 0.32, 95% CI: 0.13-0.79). Over a median observation time of 12.84 years, the prospective evaluation demonstrated a notable inverse correlation (Q4 HR: 0.46, 95% CI: 0.40-0.54). Non-linear modeling via restricted cubic splines uncovered an L-shaped dose-response curve. Furthermore, MR results confirmed a genetically predicted protective causality (IVW OR: 0.68, 95% CI: 0.49-0.95), which remained consistent across sensitivity validations. Conclusions: Among smokers, higher dietary fibre intake is robustly associated with a reduced risk of mental and behavioral disorders due to the use of tobacco, offering a modifiable dietary target for public health interventions.

15
Direct and mediated effects (DME) SLCMA: a novel method for life course modelling with time-varying covariates

Beer, S.; Simpkin, A. J.; Eldeeb, S. Y.; Zar, H. J.; Stein, D. J.; Dunn, E. C.; Smith, A. D. A. C.

2026-06-06 epidemiology 10.64898/2026.05.29.26354427 medRxiv
Top 0.1%
3.6%
Show abstract

Background: In prospective cohort studies, where an exposure is collected repeatedly, interest often lies in determining whether the timing of that exposure has a differential effect on a later outcome. The Structured Life Course Modeling Approach (SLCMA), where users select between temporal hypotheses of exposure specified a priori, provides one way to analyse such longitudinal data. However, few studies using SLCMA consider the effect of time-varying covariates (TVC) which may impact associations. Methods: We present a modified version of the SLCMA - called direct and mediated effects (DME)-SLCMA - which corrects for TVC. We first develop the DME-SLCMA method, test it through simulation, and apply it to psychosocial data from the Drakenstein Child Health Study (DCHS, n=336) to investigate relationships between maternal psychopathology, TVC of socioeconomic status, and offspring depressive symptoms. Results: We found that, on average, offspring depressive symptoms score increased by 3.9% (95% CI: 1.0%-6.9%, p = 0.039) for each unit of maternal psychopathology (SRQ) at 48 months whilst adjusting for time-varying socioeconomic status (at 18, 30, 42 and 54 months). Our simulations identified several realistic scenarios where selections ignoring TVC - with TVC mediated exposure effects present - were prone to be incorrect, including our DCHS example. Conclusion: DME-SLCMA is a robust new approach for life course modelling in the presence of time-varying covariates. We recommend adjusting for TVC whenever possible, and, when not possible, our simulation study identified that scenarios where mediated effects are comparable, or greater, in magnitude to direct effects are most prone to confounding.

16
Cardiometabolic health trajectories from birth to old age based on multi-decadal series of biochemistry and anthropometry

Makinen, V.-P.; Kahonen, M.; Lehtimaki, T.; Hutri, N.; Ronnemaa, T.; Viikari, J.; Pahkala, K.; Rovio, S.; Niinikoski, H.; Mykkanen, J.; Raitakari, O.; Ala-Korpela, M.

2026-04-07 epidemiology 10.64898/2026.04.01.26349266 medRxiv
Top 0.1%
3.5%
Show abstract

Background and aims: Direct evidence to connect early life metabolism with cardiometabolic diseases in old age is limited due to the rarity of multi-decadal biochemical follow-up studies. To gain deeper insight into metabolic ageing, we conducted a longitudinal study that integrates serial data on clinical biomarkers, metabolomics and clinical events across the human life course. Methods: Children born in 1962-1992 were included from four European cohorts. Time-series of clinical biomarkers and metabolomics data were available for 8,653 participants (ages 0-49 years, 142 molecular and four physiological variables). Comparable data for 13,795 UK Biobank participants at two visits (ages 40-79 years) were linked with retrospective and prospective records of diabetes and cardiovascular disease. Lifetime metabolic trajectories were reconstructed by unsupervised machine learning and local polynomial regression. Results: A stable stratification in metabolic health emerged in children between ages 3 and 12 years and persisted to old age. We summarized this population pattern by assigning each participant into one of seven metabolic subgroups with characteristic biomarker trajectories. Two subgroups (MetDys TG+ and MetDys TG-) featured increased waist-height ratio from childhood, persistently higher C-reactive protein throughout life and rapidly increasing fasting insulin between 30 and 49 years of age. Both subgroups exhibited high risk for diabetes (HR > 13) and ischemic heart disease (HR > 2.5) when compared against the lowest risk subgroup (High HDL ApoB-). Conclusions: This life-course analysis shows that metabolic dysfunction associated with excess weight gain begins in early childhood and is associated with cardiometabolic morbidity in later life.

17
Can large language models approximate human perceptions of disease severity? An evaluation using Global Burden of Disease 2010 disability weights

Ha, Y.; Park, H.; Lee, Y.; Kim, S.; Ahn, S.

2026-05-04 health informatics 10.64898/2026.05.02.26352261 medRxiv
Top 0.2%
3.5%
Show abstract

BackgroundDisability weights (DWs) quantify the severity of health loss and are essential for estimating disability-adjusted life years in the Global Burden of Disease (GBD) framework. Conventional DW estimation relies on resource-intensive population surveys that are difficult to update or adapt to emerging health states. Large language models (LLMs) may offer a scalable alternative by approximating human perceptions of disease severity through structured judgment tasks. MethodsThis exploratory study evaluated the alignment between LLM-derived and human-derived DW rankings using 222 health states from GBD 2010. All possible pairwise comparisons (24,531 pairs, each repeated three times) were conducted across four LLMs (GPT-5 mini, GPT-5, Claude Haiku 4.5, and Claude Sonnet 4.5). DWs were estimated via probit regression and evaluated using Spearmans rank correlation and Steigers z test. The effects of prompt language (English vs. Korean), cultural role prompting, and medical specialist role prompting on alignment were examined. Additionally, the Binomial-Logit Indifference-Point (BLIP) estimator was proposed and validated through leave-one-out cross-validation for estimating DWs for health states without established values. ResultsAll four LLMs showed high rank correlation with GBD 2010 DWs (Spearmans {rho} = 0.893 to 0.909), with no significant inter-model differences. Korean-language prompting significantly improved alignment with Korean DWs ({rho} = 0.756 vs. 0.715, p = 0.011), and Korean cultural role prompting improved alignment with both GBD 2010 DWs ({rho} = 0.922 vs. 0.909, p = 0.002) and Korean DWs ({rho} = 0.738 vs. 0.715, p = 0.001). Medical specialist role prompting significantly reduced alignment with GBD 2010 DWs ({rho} = 0.895 vs. 0.909, p = 0.001). BLIP demonstrated strong agreement with GBD 2010 DWs (Pearsons r = 0.862, MAE = 0.066) and produced plausible estimates for Long COVID (mild: 0.020, moderate: 0.298, severe: 0.529). ConclusionsLLMs can approximate human perceptions of disease severity with high rank-order consistency. Prompt language and role framing significantly influenced alignment, with culturally grounded lay prompting enhancing and specialist prompting reducing correspondence with population-based DWs. BLIP provides a practical framework for generating provisional DW estimates for emerging or underrepresented health states when conventional surveys are infeasible.

18
Social mobility and long-term episodic memory in Britain

Tampubolon, G.

2026-04-13 epidemiology 10.64898/2026.04.12.26350709 medRxiv
Top 0.2%
3.2%
Show abstract

Population ageing increases the importance of cognitive capacity for making decisions about retirement and living independently beyond it. We tested whether post-war educational expansion and working-life social mobility eliminate the association between social class of origin and cognition in early old age using the 1958 National Child Development Study. Two outcomes were analysed at age 62: standard episodic memory (immediate + delayed word recall) and long-term episodic memory, capturing accurate half-century recall of childhood household facts (rooms and people at age 11 validated against mothers responses). Social mobility trajectories derived in prior work were classified into predominantly manual versus non-manual class trajectories. Models were estimated separately for women and men across three specifications: (i) social origin and controls, (ii) adding social mobility, and (iii) adding weighting to address healthy survivor bias. Education was consistently associated with both outcomes. For long-term episodic memory, social origin gradients were clearer than for short-term episodic memory, with men from service/professional origins showing a 13 percentage-point higher probability of accurate half-century recall than men from manual origins. These findings indicate that education expansion and working-life social mobility failed to release the grip of social origin on long-term episodic memory.

19
Causal estimands and target trials for the effect of lag time to treatment of cancer patients

Goncalves, B. P.; Franco, E. L.

2026-04-08 epidemiology 10.64898/2026.04.07.26350338 medRxiv
Top 0.2%
3.2%
Show abstract

Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.

20
Disentangling infectiousness and susceptibility by age group using transmission pair data: a study of SARS-CoV-2 household transmission

Leung, K. Y.; Miura, F.; Backer, J. A.

2026-06-05 epidemiology 10.64898/2026.06.04.26354892 medRxiv
Top 0.2%
3.1%
Show abstract

Background Differential contributions to transmission across age groups have been reported for many respiratory infections, including SARS-CoV-2. They are crucial for estimating the impact of age-specific interventions. Disentangling these age-dependent contributions remains challenging, as they may reflect differences in contact rates, biological susceptibility, or infectiousness. Aim We aim to jointly estimate age-specific per-contact infectiousness and susceptibility and their effect on the impact of age-specific interventions. Methods The age-specific infectiousness and susceptibility were jointly estimated in a Bayesian framework by combining contact data with transmission pair data (who-infected-whom). We applied this approach to 197,840 self-reported household transmission pairs collected in the Netherlands during the COVID-19 pandemic. Using these estimates, we projected the expected impact of school closure and work-from-home measures during the early stages of an epidemic in the absence of other interventions. Results Both infectiousness and susceptibility to SARS-CoV-2 infection were lowest in children aged 0-9 years and highest in adults over 30 years old, with 2- to 4.5-fold differences between these groups. Projected impacts of age-specific interventions indicated that school closures would reduce the reproduction number by 8% or 29% when age-specific susceptibility and infectiousness were or were not considered, respectively. Conversely, working-from-home policies would lead to reductions of 41% with and 20% without age-specific infectiousness and susceptibility. Conclusion Our method enables robust estimation of age-specific infectiousness and susceptibility. Accounting for these age heterogeneities is essential for projecting the impact of age-targeted interventions. Our approach is adaptable to other respiratory infections and can guide more tailored public health responses.